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Abstract 

We present three semi-streaming algorithms for Maximum Bipartite Matching with one 
and two passes. Our one-pass semi-streaming algorithm is deterministic and returns a matching 
of size at least 1/2 + 0.005 times the optimal matching size in expectation, assuming that edges 
arrive one by one in (uniform) random order. Our first two-pass algorithm is randomized and 
returns a matching of size at least 1/2 + 0.019 times the optimal matching size in expectation 
(over its internal random coin flips) for any arrival order. These two algorithms apply the 
simple Greedy matching algorithm several times on carefully chosen subgraphs as a subroutine. 
Furthermore, we present a two-pass deterministic algorithm for any arrival order returning a 
matching of size at least 1/2 + 0.019 times the optimal matching size. This algorithm is built 
on ideas from the computation of semi-matchings. 

1 Introduction 

Streaming. Classical algorithms assume random access to the input. This is a reasonable as- 
sumption until one is faced with massive data sets as in bioinformatics for genome decoding, Web 
databases for the search of documents, or network monitoring. The input may then be too large 
to fit into the computer's memory. A typical situation is a continuous flow of traffic logs sent to 
a router. Streaming algorithms sequentially scan the input piece by piece in one pass, while using 
sublinear memory space. The analysis of Internet traffic [2J was one of the first applications of 
such algorithms. A similar but slightly different situation arises when the input is recorded on an 
external storage device where only sequential access is possible, such as optical disks, or even hard 
drives. Then a small number of passes, ideally constant, can be performed. 

By sublinear memory one ideally means memory that is poly logarithmic in the size of the input. 
Nonetheless, polylogarithmic memory is often too restrictive for graph problems: as shown in [7J, 
deciding basic graph properties such as bipartiteness or connectivity already requires O(n) space. 
Muthukrishnan [16] initially mentioned massive graphs as typical examples where one assumes a 
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semi-external model, that is, not the entire graph but the vertex set can be stored in memory. In 
that model, an n-vertex graph is given by a stream of edges arriving in arbitrary order. A semi- 
streaming algorithm has memory space 0(n polylog n), and the graph vertices are usually known 
before processing the stream of edges. 

Matching. In this paper we focus on an iconic graph problem: finding large matchings. In the 
semi-streaming model, the problem was primarily addressed by Feigenbaum, Kannan, McGregor, 
Suri and Zhang |6j . In the meantime a variety of semi-streaming matching algorithms for particular 
settings exist (unweighted/weighted, bipartite/general graphs). Most works, however, consider 
the multipass scenario [H d] where the goal is to find a (1 — e) approximation while minimizing 
the number of passes. The techniques are based on finding augmenting paths, and, recently, 
linear programming was also applied. Ahn and Guha [1] provide an overview of the current best 
algorithms. In this paper, we also take the augmenting paths route. 

In the one-pass setting, in the unweighted case, the greedy matching algorithm is still the best 
known algorithm as far as we know. (We note that in the weighted case, progress was made [T7] 
[5], but when the edges are unweighted those algorithms are of no help.) The greedy matching 
constructs a matching in the following online fashion: starting with an empty matching M, upon 
arrival of edge e, it adds e to M if M + e remains a matching. A maximal matching is a matching 
that can not be enlarged by adding another edge to it. It is well-known that the cardinality of 
maximal matchings is at least half of the cardinality of maximum matchings. By construction, since 
the greedy matching is maximal, M is a 1/2-approximation of any maximum matching M* , that 
is \M\ > \M*\/2. The starting point of this paper was to address the following question: Is the 
greedy algorithm best possible, or is it possible to get an approximation ratio better 
than 1/2? 

In fact, a very recent result [8] rules out the possibility of any one-pass semi streaming algorithm 
for Maximum Bipartite Matching (MBM) with approximation ratio better than 2/3, since that 
would require memory space n 1+ ^( 1 / loglogn - ) . Nevertheless, there is still room between 1/2 and 2/3. 

To get an approximation ratio better than 1/2, prior semi-streaming algorithms require at least 
3 passes, for instance the algorithm of [4] can be used to run in 3 passes providing a matching 
strictly better than a 1/2 approximation. 

Random order of edge arrivals. The behavior of the greedy matching algorithm has been 
studied extensively in a variety of settings. The most relevant reference [3] considers a (uniform) 
random order of edge arrivals. In that setting, Dyer and Frieze showed that the expected approxi- 
mation ratio is still 1/2 for some graphs (their example can be extended to bipartite graphs), but 
can be better for particular graph classes such as planar graphs and forests. 

In the context of streaming and semi-streaming algorithms, the model of random order arrival 
has been first studied for the problems of sorting and selecting in limited space by Munro and 
Paterson [15J. Then Guha and McGregor [9] gave an exponential separation between random order 
and adversarial order models. One justification of the random order model is to understand why 
certain problems do not admit a memory efficient streaming algorithms in theory, while in practice, 
heuristics are often sufficient. 

Other related work. MBM was also intensively studied in the online setting, where nodes 
from one side arrive in adversarial order together with all their incident edges. In this model, the 
decision to take or discard an edge has to be taken before accessing the edges of the next vertex. 
The well-known randomized algorithm by Karp and Vazirani |llj achieves an approximation ratio 
of (1 — 1/e) for bipartite graphs where all nodes from one side are known in advance, the nodes 
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from the other side arrive online. They prove that their bound is optimal in the worst case. This 
barrier was broken only recently by modifying the worst case assumption (worst input graph and 
worst arrival order) to assume that, although the graph itself is worst-case, the arrival order is 
according to some (known or unknown) distribution |1CH [12] . 

Our results. In this paper we present algorithms for settings in which we can beat 1/2. We 
design semi-streaming algorithms for MBM with one and two passes. Our one-pass semi-streaming 
algorithm is deterministic and achieves an expected approximation ratio 1/2 + 0.005 for any graph 
(Theorem [I]) , but has to assume that the edges arrive one by one in (uniform) random order. Our 
two two-pass semi-streaming algorithm do not need the random order assumption. We present a 
randomized two-pass algorithm with expected approximation ratio 1/2 + 0.019 against its internal 
random coin flips, for any graph and for any arrival order (Theorem [3]) . Furthermore, we present 
a deterministic counterpart with the same approximation ratio for any graph and any arrival order 
(Theorem [4]) . 

Techniques. The one-pass algorithm as well as the randomized two-pass algorithm apply 
each three times the greedy matching algorithm on different and carefully chosen subgraphs. The 
deterministic two-pass algorithm is slightly more complicated as it uses besides the greedy algorithm 
a subroutine that computes a particular semi-matching. 

General idea common to all our algorithms: If we had three passes at our disposal (see for 
instance Algorithm 2 in [6]), we could use one pass to build a maximal matching Mq between the 
two sides A and B of the bipartition, a second pass to find a matching M\ between the A vertices 
matched in Mo and the B vertices that are free w.r.t. Mo whose combination with edges of Mo 
forms paths of length 2. Finally, a third pass to find a matching M2 between B vertices vertices 
matched in Mo and A vertices that are free w.r.t. Mo whose combination with Mo and M\ forms 
paths of length 3 that can be used to augment matching Mo. All our algorithms simulate these 3 
passes in less passes. 

One-pass algorithm for random arrival order: To simulate this with a single pass, we split the 
sequence of arrivals [l,m] into three phases [l,am], (am, /3m], and (/3m, m] and build Mo during 
the first phase, M\ during the second phase, and M2 during the third phase. Of course, we see 
only a subset of the edges for each phase, but thanks to the random order arrival, these subsets are 
random, and, intuitively, we loose only a constant fraction in the sizes of the constructed matchings. 
As it turns out, the intuition can be made rigorous, as long as the first matching Mo is maximal 
or close to maximal. We observe that, if the greedy algorithm, executed on the entire sequence 
of edges, produces a matching that is not much better than a 1/2 approximation of the optimal 
maximum matching, then that matching is built early on. More precisely (Lemma EJ), if the greedy 
matching on the entire graphs is no better than a 1/2 + e approximation, then after seeing a mere 
one third of the edges of the graph, the greedy matching is already a 1/2 — e approximation, so it 
is already close to maximal. 

Randomized two-pass algorithm for any arrival order: Assume a bipartite graph (A, B, E) 
comprising a perfect matching. If A' is a small random subset of A, then, regardless of the arrival 
order, the greedy algorithm that constructs a greedy matching between A' and B (that is, the 
greedy algorithm restricted to the edges that have an endpoint in A') will find a matching that 
is near-perfect, that is, almost every vertex of A' is matched (see Theorem [2] for a slightly more 
general version of this statement). This property of the greedy algorithm may be of independent 
interest. Then, in one pass we compute a greedy matching Mo and also via the greedy algorithm 
independently and in parallel a matching Mi between a subset A' C A and the B vertices. It turns 
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out that Mq U M\ comprise some length 2 paths that can be completed to 3-augmenting paths by 
a third matching M2 that we compute in the second pass. 

Deterministic two-pass algorithm for any arrival order: Again, assume a bipartite graph 
(A, B, E) comprising a perfect matching and some integer A. Add now greedily edges ab to a 
set S if the degree of a in S is yet 0, and the degree of b is smaller than A. This algorithm computes 
an incomplete semi-matching with a degree limitation A on the B nodes. In the first pass, we run 
this algorithm in parallel to the greedy matching algorithm for constructing Mq. S replaces the 
computation of Mi, and we will that there are length 2 paths in Mq U S that can be completed to 
3-augmenting paths in the second pass via a further greedy matching M2. 

Extension to general graphs. All algorithms presented in this paper generalize to non- 
bipartite graphs. When searching for augmenting paths in general graphs, algorithms have to cope 
with the fact that a candidate edge for an augmenting path may form an undesired triangle with 
the edge to augment and an optimal edge. In this case, the candidate edge can block the entire 
augmenting path. McGregor |13j overcomes this problem by repeatedly sampling bipartite graphs 
from the general graph. Such a strategy, however, is not necessary for our randomized algorithms. 
Since these make use of randomness (either over all input sequences, or over internal random coins), 
we show that undesired triangles simply do not appear too often allowing our techniques to still 
work. For our deterministic two-pass algorithm, a direct combinatorial argument can be used to 
bound the number of triangles. 

2 Preliminaries 

Notations and Definitions. Let G = (A,B,E) be a bipartite graph with V = A U B, n = \ V\ 

vertices and m = \E\ edges. For an edge e S E with end points a £ A and b £ B, we denote e by 
ab. The input G is given as a sequence of edges arriving one by one in some order. Let 11(G) be 
the set of all edge sequences of G. 

Definition 1 (Semi-Streaming Algorithm). A k-pass semi-streaming algorithm A with processing 
time per letter t is an algorithm such that, for every input stream it £ 11(G) encoding a graph G 
with n vertices: (1) A performs in total at most k passes on stream tt, (2) A maintains a memory 
space of size 0{n polylog n), (3) A has running time 0(t) per edge. 

For a subset of edges F, we denote by opt(i ? ) a matching of maximum size in the graph G 
restricted to edges F. We may write opt(G) for opt(E'), and we use M* = opt(G). We say that an 
algorithm A computes a c- approximation of the maximum matching if A outputs a matching M 
such that \M\ > c- |opt(G)|. We consider two potential sources of randomness: from the algorithm 
and from the arrival order. Nevertheless, we will always consider worst case against the graph. For 
each situation, we relax the notion of c-approximation so that the expected approximation ratio is 
c, that is E \M\ > c ■ |opt(G)| where the expectation can be taken either over the internal random 
coins of the algorithm, or over all possible arrival orders. 

For simplicity, we assume from now on that A, B and m = \E\ are given in advance the 
symmetric difference (5i \ S2) U (52 \ Si) of the two sets. 

For an input stream ir £ n(G), we write n[i] for the i-th edge of n, and ir[i, j] for the subsequence 
7r[i]7r[i + 1] . . . ic\j]- In this notation, a parenthesis excludes the smallest or respectively largest 
element: ir(i,j] = n[i + and w[i,j) = — 1]. If i,j are real, := vr[[ij, and 
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7r[i] := 7T [[*]]. Given a subset S C V, 7r| ^ is the largest subsequence of 7r such that all edges in ir\s 
are among vertices in S. 

For a set of vertices S and a set of edges F, let ^(i 7 ) be the subset of vertices of S covered 
by F. Furthermore, we use the abbreviation S(F) := S \ S(F). For Sa C A and 5b C B, we 
write opt^^Se) for opt(G|5 AX s s )> that is a maximum matching in the subgraph of G induced 
by vertices Sa U Sb- 

Maximal and Greedy Matchings. Formally, the greedy matching algorithm Greedy on 
stream tt is defined as follows: Starting with an empty matching M, upon arrival of an edge 7r[i], 
Greedy inserts Denote by Greedy(7r) the matching M after the stream tt has been fully processed. 
By maximality, |Greedy(7r)| > i|opt(G)|. Greedy can be seen as a semi-streaming algorithm for 
MBM with expected approximation ratio i and 0(1) processing time per letter. We now state 
some preliminary properties. Lemma[T]shows that a maximal matching that is far from the optimal 
matching in value must also be far from the optimal matching in Hamming distance. 

Lemma 1. Let M* = opt(G), and let M be a maximal matching of G. Then \M n M*| < 
2(|M| - ±|M*|). 

Proof. This is a piece of elementary combinatorics. Since M is a maximal matching, for every edge 
e of M* \ M, at least one of the two endpoints of e is matched in M \ M*, and so \M \ M*\ > 
(1/2)\M* \M\. We have \M* \M\ = \M*\ - \M* n M|. Combining gives 

|M n M*\ =\M\-\M\M*\< \M\ - -\M* \M\ = \M\ - ^{\M*\ - \M* n M\) 

which implies the Lemma. □ 

Lemma [2] shows that maximal matchings that are small in size contain many edges that are 
3-augmentable. Given a maximum matching M* = opt(G), and a maximal matching M, we say 
that an edge e £ M is 3-augmentable if the removal of e from M allows the insertion of two edges 
/, g from M* \ M into M. 

Lemma 2. Let e > 0. Let M* = opt(G), let M be a maximal matching of G st. \M\ < (i + e)|M*|. 
Then M contains at least — 3e)|M*| 3-augmentable edges. 

Proof. The proof is folklore. Let ki denote the number of paths of length i in M © M*. Since M* 
is maximum, it has no augmenting path, so all odd length paths are augmenting paths of M. Since 
M is maximal, there are no augmenting paths of length 1, so ki = 0. Every even length path and 
every cycle has an equal number of edges from M and from M* . A path of length 2i + 1 has i 
edges from M and i + 1 edges from M*. 

\M*\-\M\=J2hi+i < h + J2\ ik2i + 1 = 2 ks+ 2^ ik2i+1 ~ 2 A:3+ 2 |M| " 

i>l i>2 %>1 

Thus, using our assumption on \M\, 

k 3 > 2\M*\ - 3\M\ > 2\M*\ - (- + 3e)|Af*|, 
implying the Lemma. □ 
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3 One-pass algorithm on random order 

We discuss now a one-pass semi-streaming algorithm for MBM with an expected approximation 
ratio strictly greater than |. 

Algorithm. Here is a key observation in the random order setting: if Greedy performs badly 
on some input graph G, then most edges of Greedy appear within the first constant fraction of the 
stream, see Lemma HI Our strategy is hence to run Greedy on a first part of the stream, and then, 
on the remaining part of the stream, we focus on searching for 3- augmenting paths. 

Let Mq denote the matching computed by Greedy on the first part of the stream. Assume that 
Greedy performs badly on the input graph G. Lemma [2] tells us that almost all of the edges of Mq 
are 3-augmentable. To find 3-augmenting paths, in the next part of the stream we run Greedy to 
compute a matching Mi between B(Mq) and A{Mq). The edges in M\ serve as one of the edges of 
3-augmenting paths (from the -B-side of Mo). In Lemma [5l we show that we find a constant fraction 
of those. In the last part of the stream, again by the help of Greedy, we compute a matching M2 
that completes the 3-augmenting paths. Lemma [8] shows that by this strategy we find many 3- 
augmenting paths. Then, either a simple Greedy matching performs well on G, or else we can find 
many 3-augmenting paths and use them to improve Mq: see the main theorem, Theorem [1] whose 
proof is deferred to the end of this section. An illustration is provided in Figure Q] in Appendix [AJ 

Algorithm 1 Matching in one pass 
1: a <r- 0.4312,/? <r- 0.7595 
2: Mq <- Greedy (ir) 

3: Mq <— Greedy(7r[l, am]), matching obtained by running Greedy on the first [am\ edges 
4: F\ <— complete bipartite graph between B(Mq) and A(Mq) 

5: Mi <— Greedy(i ? i n ir(am, /3m]), matching obtained by running Greedy on edges [am\ + 1 
through (3m that intersect F\ 

6: A' <- {a G A I 3b G B{M X ) : ab G M } 

7: F<i <— complete bipartite graph between A' and B{Mq) 

8: M2 <— Greedy (i ? 2n7r(/3m, m]), matching obtained br running Greedy on edges \j3m\ +1 through 

m that intersect F2 
9: M <— matching obtained from Mq augmented by Mi U M2 
10: return larger of the two matchings Mq and M 



Observe that our algorithm only uses memory space 0(n log n). Indeed, each subsets F\ and 
F2 can be compactly represented by two n-bit arrays, and checking if an edge of it belongs to one 
of them can be done within time O(l) from that compact representation. 

Theorem 1. AlgorithmUlis a deterministic one-pass semi-streaming algorithm for MBM with ap- 
proximation ratio 7j +0.005 against (uniform) random order for any graph, and can be implemented 
with O(l) processing time per letter. 

Analysis. We use the notations of Algorithm [TJ Consider a and (3 as variables with < a < 
\ < (3 < 1. 

Lemma 3. Ve = ab G E : Pr[a and b V(M )] < - 1) Pr[e G Mq}. 

Proof. Observe: Pr[a and b £ V(M )} + Pr[e G M ] = Pr[a and b £ V(M \ {e})], because the two 
events on the left hand side are disjoint and their union is the event on the right hand side. 
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Consider the following probabilistic argument. Take the execution for a particular ordering tt. 
Assume that a and b £ V{Mq \ {e}) and let t be the arrival time of e. If we modify the ordering by 
changing the arrival time of e to some time t' < t, then we still have a and b £ V(Mq \ {e}). Thuaj 

Pr[a and b $ V(M Q \ {e})] < Pr[a and b <£ V(M \ {e})\e G vr[l, am]]. 

Now, the right-hand side equals Pr[e G Mo|e G 7r[l,am]], which simplifies into Pr[e G Mo]/Pr[e G 
7r[l, am]] since e can only be in Mq if it is one of the first am arrivals. The we conclude the Lemma 
by the random order assumption Pr[e G 7r[l, am]] = a. 

□ 

Lemma 4. IfE n \M G \ < (± + e)|M*|, thenE n \M \ > |M*|(| - (± - 2)e). 

Proof. Rather than directly analyzing the number of edges \Mq\, we analyze the number of vertices 
matched by Mq, which is equivalent since |V(Mq)| = 2(|Mo|). 

Fix an edge e = ab of M* . Either e G Mq, or at least one of a, b is matched by Mq, or neither 
a nor b are matched. Summing over all e G M* gives 

|V(M )| > 2|M* n M 1 + \M* \ M \ - ^ x [a and & g F(M )], 

e=a6GM* 

where x[-X"] = 1 if the event X happens, otherwise xPCl = Taking expectations and using 
Lemma [3J 

E(|F(M )|) > 2E \M* n Mq\ + E |M* \ M | — (— — 1)E |M* n M | 

= |M*| - ( 2)E|M*nM Q |. 

a t 

Since Mo is just a subset of the edges of Mq, using Lemma [Q and linearity of expectation, 
E|M*nM | < E |M* n Mq\ <2(E|M G | - -|M*|) < 2e|M*|. 

7T 7T 7T 2 

Combining gives the Lemma. □ 

Lemma 5. Assume that E^ |Mg| < (5 + e)|M*|. T/ien i/ie expected size of the maximum matching 
between the vertices of A left unmatched by Mq and the vertices of B matched by Mq can be bounded 
below as follows: 

.1 ,1 



E|opt(A(M ),i?(M ))| > |M*|(--(- + 2)e) 
t la 



Proof. The size of a maximum matching between A(Mo) and B(Mq) is at least the number of 
augmenting paths of length 3 in Mo©M*. By Lemma[2l in expectation, the number of augmenting 
paths of length 3 in Mq © M* is at least (| — 3e)|M*|. All of those are augmenting paths of length 
3 in Mq © M* , except for at most \Mq\ — \ Mq\. Hence, in expectation, Mq contains (| — 3e)|M*| — 
(E,,- 1 Ms I — Ejr I Mq |) 3-augmentable edges. Lemma H] concludes the proof. □ 



1 Formally, we define a map / from the uniform distribution on all orderings to the uniform distribution on all 
orderings such that e £ 7r[l, am]: if e G tt[1, am] then f(n) = 7r and otherwise /(w) is the permutation obtained from 
7r by removing e and re-inserting it at a position picked uniformly at random in [1, am]. 
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Lemma 6. E^Mi) > ±(/3 - a)(E„ |opt(^(Mo), B(M ))\ - j^). 

Proof. Since Greedy computes a maximal matching which is at least half the size of a maximum 
matching, E n \M X \ \opt(A(M ) , B (M )) n w(am,Pm]\. 

By independence between Mq and the ordering within (am, m], we see that even if we condition 
on Mo, we still have that n(am, /3m] is a random uniform subset of ir(am,m]. Thus: 

E \opt( A( M ), B( M )) D ir ( am, fim]\ = £=£ E \opt(A(M ), B(M )) D ir(am, mil. 

We use a probabilistic argument similar to but slightly more complicated than the proof of 
Lemma El We define a map / from the uniform distribution on all orderings to the uniform 
distribution on all orderings such that e G ir(am,m]: if e G n(am,m] then /(vr) = 7r and otherwise 
/(•7r) is the permutation obtained from tt by removing e and re- inserting it at a position picked 
uniformly at random in (am, m]; in the latter case, if this causes an edge / = a'b', previously arriving 
at time [am] + 1, to now arrive at time [am] and to be added to Mo, we define Mq = Mo \ {/}; 
in all other cases we define Mq = Mo. Thus, if in it we have e G opt(A(Mo), -B(Mq)), then in 
f(ir) we have e G opt(A(Mo), B(Mq)). Since the distribution of /(vr) is uniform conditioned on 
e G ir(am,m]: 

Pr[e G opt(A(M^),B(Mg)) and e G vr(am,m]] 
Pr[e G ir(am, m]] 
Using Pr[e G ir(am, m]] = 1 — a and summing over e: 



>Pr[eGopt(A(Mo),5(M ))], 



E w |opt(A(M^,E(M^)n7r(m»,m]| > (1 - a)E, |opt(A(M ), B(M ))|. 
Since M^ and M differ by at most one edge, |opt(^(M ), B(M ))\ > |opt(A(M^), B(M' Q ))\ - 1, and 
the Lemma follows. □ 

Lemma 7. Assume that |M G | <{\ + e)|M*|. T/ien: 

E|opt(vl / ,S(Mo)| >E|Mi| -4e|M*|. 



Proof. |opt(j4', S(Mo)| is at least \M\ \ minus the number of edges of Mo that are not 3-augmentable. 
Since Mo is a subset of Mq, the latter term is bounded by the number of edges of Mq that are not 
3-augmentable, which by Lemma[2]is in expectation at most (i+e)|M*| — (i— 3e)|M*| = 4e|M*|. □ 
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Lemma 8. E |M 2 | > -((1 - /3)E |qpt(A', B(M ))\ - 1). 

TT 2 T 

Proof. Since Greedy computes a maximal matching which is at least half the size of a maximum 
matching, 

1 



E|M 2 | > -E|opt(A',S(Mo))nvr(/3m,m]|. 

7T 2 TV 

Formally, we define a map / from the uniform distribution on all orderings to the uniform 
distribution on all orderings such that e G ir(/3m,m]: if e G ir(/3m,m] then /(vr) = ir and otherwise 
/(7r) is the permutation obtained from it by removing e and re- inserting it at a position picked 
uniformly at random in (/3m, m]; in the latter case, if this causes an edge e' = a'b', previously 
arriving at time [/3m] + 1, to now arrive at time [/3m] and to be added to Mi, we define A" = 
A' \ {Mo(b')}; in all other cases we defi ne A" = A'. Thus, if in tt we have e G opt ( A' , B(M )), 
then in /(tt) we have e G opt(A" , B (Mo)) ■ Since the distribution of f(n) is uniform conditioned on 
e G 7r(/3m,m]: 



Pr[e G opt (.4", 5 (M )) and e G vr(/3m,m] 
Pr[e G Tr(/3m, m]] 



>Pr[e£opt(A',B(M ))], 
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Using Pr[e £ 7r(/3m, m]] = 1-/3 and summing over e: 



E|opt(A",5(M ))n7r(/3m,m]| > (1 - 0)E \opt(A', B(M ))\. 

7T 7T 

Since A' and ^4" differ by at most one vertex, 

|opt(A",B(M ))| > \opt(A',B(M ))\ - 1, and the Lemma follows. □ 

We now present the proof of the main theorem, Theorem [TJ 

of TheoremUl Assume that E^ \Mq\ < (i + e)|M*|. By construction, every e € Mi completes a 
3— augmenting path, hence \M\ > \Mq\ + \M%\. In LemmaH]we show that K n \Mq\ > |M*|(4 — — 
2)e). By Lemmas [8] and [71 IM2I can be related to \M\\: 

1 



E \M 2 \ > -(1 - (3)E\opt(A',B(M ))\ - - > -(1 - /3)(E \M X \ - Ae\M*\) 

7T Z 7V Z Z 7T Z 

By Lemmas [6] and O |Mi| can be related to \M*\: 



E|Mi| > i(^-a)E|opt(A(M ),B(Mo)|-0(l) 

7T 7T 

> i( j g_a)(|Jlf|(i-(i + 2) e ))-0(i). 

2 a 

Combining, 

E |M| > \M* |(i - (1 - 2) e + |(1 - /3)(i(/3 - a)(i - (1 + 2)c) - 4 e )) - 0(1). 

The expected value of the output of the Algorithm is at least min e max{(^ + e)|M*|, K w \M\}. We 
set the right hand side of the above Equation equal to (i + e)|M*|. By a numerical search we 
optimize parameters a, j3. Setting a = 0.4312 and f3 = 0.7595, we obtain e « 0.005 which proves 
the Theorem. □ 



4 Randomized two-pass algorithm on any order 

We present now a randomized two-pass semi-streaming algorithm for MBM with approximation 
ratio strictly greater than i. The algorithm relies on a property of the Greedy algorithm that we 
discuss before the presentation of the algorithm. This property may be of independent interest. 

Matching many vertices of a random subset of A. For a fixed parameter < p < 1, 
the following algorithm generates an independent random sample of vertices A' C A such that 
Pr[a G A'] = p, for all a £ A. Theorem [2] shows that the greedy algorithm restricted to the edges 
with an endpoint in A' will output a matching of expected approximation ratio p/(l+p), compared 
to a maximum matching opt(G) over the full graph G. Since, in expectation, the size of A' is p\A\, 
one can roughly say that a fraction of 1/(1 + p) of vertices in \A'\ has been matched. 

Algorithm 2 Matching a random subset 
1: Take independent random sample i'Ci st. Pr[a € A'] = p, for all a £ A 
2: Let F be the complete bipartite graph between A' and B 
3: return M' = Greedy(F n n) 
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The proof of Theorem [2] will use Wald's equation for super-martingales, see [2], Wald's Equa- 
tion, p. 300, section 12.30 

Lemma 9 (Wald's equation). Consider a process described by a sequence of random states (Sj)j>o 
and let D be a random stopping time for the process, such that EZ) < oo. Let (<I>(Sj))i>o be a 
sequence of random variables for which there exist c, \x such that 

1. $(s ) = o; 

2. $(S i+ i) - $(Si) < c for all i < D; and 

3. E[$(S m ) - $(S,) \Si]<n for all i < D. 
Then: 

E$(S D ) < fiED. 

Theorem 2. Let < p < 1, let G = (A,B,E) be a bipartite graph. Let A' be an independent 

random sample A' C A such that Pr[a £ A'] = p, for all a £ A. Let F be the complete bipartite 

P 

graph between A' and B Then for any input stream it £ 11(G): E I Greedy (F n 7r)| > pptfG) . 

A' 1 + p 

Proof. Let M' = Greedy(i ? n ir). For i < \M'\, denote by M[ the first i edges of M', in the order 
in which they were added to M' during the execution of Greedy. 

Let M* be a fixed maximum matching in G and let Mp denote the edges of M* that are in 
F. Let A" = A(Mp) denote the vertices of A' matched by Mp. Consider a vertex a £ A" and its 
match b in matching Mp. We say that a is live with respect to M[ if both a and b are unmatched 
in M[. A vertex that is not live is dead. Furthermore, we say that an edge of ML 1 \ M[ kills a 
vertex a if a is live with respect to M[ and dead with respect to M' i+l . 

We use Lemma [9j Here, by "time", we mean the number of edges in M', so between time i— 1 
and time i, during the execution of Greedy, several edges arrive and all are rejected except the last 
one which is added to M' . We use a potential function (fi(i) which we define as the number of dead 
vertices wrt. M[. We define the stopping time D as the first time when the event (j)(i) = \A"\ holds. 

We only need to check that the three assumptions of the Stopping Lemma hold. First, initially 
all nodes of A" are live, so (f)(0) = 0. Second, the potential function <p is non-decreasing and 
uniformly bounded: since adding an edge to M' can kill at most two vertices of A", we always have 
A(j)(i) := + 1) — (f)(i) < 2. Third, let Si denote the state of the process at time i, namely the 
information about the entire sequence of edge arrivals up to that time, hence, in particular, the set 
of i edges currently in M' . Observe that, here, G and M* are fixed. Then D is indeed a stopping 
time, since the event D > i + 1 can be inferred from the knowledge of Si. 

We now claim that: 

E(A^(i) | Si) < 1 + p. (1) 

Indeed, since A(p(i) only takes on values 0, 1 or 2, we can write that E,(A<J)(i)\Si) < 1 + Pr[A<fi(i) = 
2 1 S^]. To bound the latter probability, let e = ab denote the edge of M' i+l \ M[, let t be such that 
e = 7r[i], and let a' be the mate of b in matching M*. In order for e to change <f> by 2, it must be 
that a' is in A' and that a' was unmatched before edge e arrived. Since a' was unmatched up to 

2 The theorem cited in the book is actually weaker than the one we need, but our statement follows from the proof 
of that Theorem. Another source is available online at http://greedyalgs.info/blog/stopping-times/ 
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arrival t, no edge a'b' had been seen among the first t edges of stream it, such that b' was free at 
arrival time (of a'b'). Thus 

Pr[A0(i) = 2\Si) < 

Pr[a' G A' and $a'b' £ tt[1, i] st. 6' was free when a'6' arrived|Si]. 

Now, given that no edge / = a'b' arrived before t such that b' was free when a'b' arrived, the 
outcome of the random coin determining whether a' £ A' was never looked at, and could have been 
postponed until t. Thus 

Pr[a' £ A' \ ($a'b' £ ir[l,t] such that b' was free when a'b' arrived, Si)] = 
Pr[a £ A'} = p, 

implying Inequality [1] Applying Walds' Stopping Lemma, we obtain E0(D) < (1 + p)ED. 
Finally, observe that E(j>(D) = E\A"\ = p- |opt(G)| and that D < | Greedy (F n vr)|, and the 
Theorem follows. 

□ 

Application: a randomized two-pass algorithm. Based on Theorem [21 we design our 
randomized two-pass algorithm. Assume that Greedy (ir) returns a matching that is close to a \ 
approximation. In order to apply Theorem [21 we pick an independent random sample A' C A such 
that Pr[a G A'] = p for all a. In a first pass, our algorithm computes a Greedy matching Mq of G, 
and a Greedy matching M' between vertices of A' and B. M' then contains some edges that form 
parts of 3-augmenting paths for Mq: see Figure [2] and Figure [3] for an illustration in Appendix IB1 
Let Mi C M' be the set of those edges. It remains to complete these length 2 paths Mq U M\ in a 
second pass by a further Greedy matching M2. Theorem [3] states then that if Greedy (tt) is close 
to a \ approximation, then we find many 3-augmenting paths. 

Algorithm 3 Two-pass bipartite matching 

l: Let p <— V2 — 1. 

2: Take an independent random sample A' C A st. Pr[a G ^4'] = p, for all a G A 

3: Let Fi be the set of edges with one endpoint in A' . 

4: First pass: M Greedy(7r) and M' Greedy(Fi n tt) 

5: Mi <- {e G M' I e goes between £(M ) and A(M )} 

6: A 2 ^ {a£ A(M ) : 3b,c: a b G M and 6c G Mi}. 

7: Let F2 ^— {da : d G S(M ) and a G A(M ) and 3b, c : ab £ M and 6c G Mi}. 

8: Second pass: M 2 <(— Greedy(F 2 n tt) 

9: Augment Mq by edges in Mi and M 2 and store it in M 

10: return the resulting matching M 



Theorem 3. Algorithm^ is a randomized two-pass semi- streaming algorithm for MBM with ex- 
pected approximation ratio ^ + 0.019 in expectation over its internal random coin flips for any graph 
and any arrival order, and can be implemented with 0(1) processing time per letter. 

Proof. By construction, each edge in M 2 is part of a 3-augmenting path, hence the output has size: 
\M\ = \M \ + \M 2 \. 
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Define e to be such that \Mq\ = (i -+- e)|opt(G)|. Since M 2 is a maximal matching of F2, we 
have |M 2 | > ||opt(-F 2 )|. Let M* be a maximum matching of G. Then | opt (-Z 7 ^ ) I is greater than or 
equal to the number of edges ab of Mq such that there exists an edge be of M\ and an edge da of 
M* that altogether form a 3- augmenting path of Mq: 

|opt(F 2 )| > \{ab e M \3c:bce Mt and 3d : da € M*}\ 

> \{ab G Mq I 3c : be G M{\\ - \{ab G Af | a& not 3-augmentable}|. 

Lemma [2] gives |{a& G Mq | ab is not 3-augmentable with M*}\ < 4e|opt(G)|. It remains to 
bound \{ab G Mq \ 3c : be G M{\\ from below. By definition of M' and of M 1 C M', and by 
maximality of Mo, 

|{a6GM |3c:6cGMi}| = \M'\ - \{ab £ M' \ a £ A(M )}\ 

> \M'\-\A(M )r\A'\. 

Taking expectations, by Theorem [2] and by independence of Mq from A': 

E |M'| - E |A(M ) n A'| > -^-|opt(G)| - p{\ + e)|opt(G)|. 

Combining: 

E |M| > (i + e)|opt(G)| + 1 (\opt(G)\p(-±- - \ - e) - 4e|opt(G)|. 
A' 2 z\ l+p2 

For e small, the right hand side is maximized for p = y/2 — 1. Then e 0.019 minimizes 
max{|M|, I -M"o|} which proves the theorem. 

□ 

5 Deterministic two-pass algorithm on any order 

The deterministic two-pass algorithm, Algorithm \E\ follows the same line as its randomized version, 
Algorithm [3J In a first pass we compute a Greedy matching Mq and some additional edges S, 
computed by Algorithm HI If Mq is not much more than a ^-approximation then S contains edges 
that serve as parts of 3-augmenting paths. These are completed via a Greedy matching in the 
second pass. 

The way we compute the edge set S is now different. Before, S was a matching M' between 
B and a random subset A' of A. Now, S is not a matching but a relaxation of matchings as 
follows. Given an integer A > 2, an incomplete X-bounded semi-matching S of a bipartite graph 
G = (A, B, E) is a subset S C E such that deg 5 (a) < 1 and deg5(6) < A, for all a G A and b G B. 
This notion is closely related to semi- matchings. A semi- matching matches all A vertices to B 
vertices without limitations on the degree of a B vertex. However, since we require that the B 
vertices have constant degree, we loosen the condition that all A vertices need to be matched. 

In Lemma [TPl we show that Algorithm U a straightforward greedy algorithm, computes an 
incomplete A-bounded semi-matching that covers at least ^-j-|M*| vertices of A. 

Lemma 10. Let S = SEMI(A) be the output of Algorithm ^\ for some A > 2. Then S is an 
incomplete X-bounded semi-matching such that \A(S)\ > ^y|M*|. 



12 



Algorithm 4 incomplete degree A limited semi- 


■matching SEMI (A) 


S ^ 




while 3 edge ab in stream 




if deg s (a) = and deg s (b) < A — 1 then S 


<- Su{ab} 


return S 





Proof. By construction, S is an incomplete degree A bounded semi-matching. We bound 
A(M*) \ A(S) from below. Let a G A(M*) \ A(S) and let b be its mate in M*. The algo- 
rithm did not add the optimal edge ab upon its arrival. This implies that b was already matched 
to A other vertices. Hence, |^4(M*) \ ^(S 1 )! < ^|A(S*)|. Then the result follows by combining this 
inequality with \M*\ - \A{S)\ < \A{M*) \ A(S)\. 

□ 

Now, assume that the greedy matching algorithm computes a Mq close to a ^-approximation. 
Then, for A > 2 there are many A vertices that are not matched in Mq but are matched in S. Edges 
incident to those in S are candidates for the construction of 3- augmenting paths. This argument 
can be made rigorous, leading to Algorithm [5] where A is set to 3, in Theorem |4l 

Algorithm 5 two-pass deterministic algorithm 
First pass: M <r- Greedy(?r) and S < - SEMI (3) 
Mi <- {e G S | e is between B(M Q ) and A(M )} 
A 2 <r- {a G A(M Q ) I 3bc : ab G M and be G Mi} 
F {e | e goes between ^2 and B(Mq)} 
Second pass: Mi <— Greedy(7r n F) 
Augment Mq by edges in Mi and M2 and store it in M 
return M 



Theorem 4. Algorithm \^is a deterministic two-pass semi- streaming algorithm for MBM with 
approximation ratio \ + 0.019 for any graph and any arrival order and can be implemented with 
O(l) processing time per edge. 

Proof. The computed matching M is of size \Mq\ + IM2I since, by construction, for each edge in 
M2 there is at least one distinct edge in Mi that allows the construction of a 3- augmenting path. 
Each 3-augmenting path increases the matching Mq by 1. See also Figure [5] in AppendixO Since 
IM2I is a maximal matching of the graph induced by the edges F, we obtain 

|M| > |M | + iopt(F). 

Let e be such that |M | = (\ + e)|M*|. By Lemma i at most 4e|M*| edges of M are not 
3- augment able, hence 

opt(F) > \A 2 \-Ae\M*\. 

Ai are those vertices matched also by Mq such that there exists an edge in Mi matching the 
mate of the A 2 vertex. Since the maximal degree in Mi is A, we can bound by 

\M > t|Mi|. 
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Note that \M\\ = \A(S) \ A(Mq)\ since the degree of an A vertex matched by S in S is one, and S 
can be partitioned into Sm ,Sj^- such that edges in Sm couple an A vertex also matched in M , 
and edges in Sj^ couple an A vertex that is not matched in Mq. Now, |Mi| = \Sj^\ since an edge 
of S is taken into Mi if it is in Sj^. 

Lemma [TOl allows us to bound the size of the set A(S) \ A(Mq) via 

\A(S) \ A(M )\ > \A(S)\ - \A(M )\ > _ * _ e) | M *|. 

Using the prior Inequalities, we obtain 

\M\ > (--£ + — ___L)|M*|. 

1 1 - v 2 2A + 2 4A 2A ;| 1 

Since we have also |M| > |M | = (| + e)|M*|, we set 

eo = mmmax { (i- e + ^L_-^ -^1,(1+^*1} 
A- 1 



8A 2 + 10A + 2 ' 



which is maximized for A = 3 leading to an approximation factor of | + ^ ~ \ + 0.019. 
Concerning the processing time per edge, note that once an edge is added in the second pass, 
a corresponding 3-augmenting path can be determined in time 0(1). 

□ 
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A Figure for the one-pass algorithm on random order 



Figure Q] illustrates Algorithm [Q 



Mi M Mi 
B A B A 




vertex € A' 
edge € M* 
edge G Mj or M 2 



a-— o— -o— -o 

o o 

other edges of Mo : 



Figure 1 : Illustration of Algorithm [TJ Note that every edge of M2 completes a 3-augmenting path 
consisting of one edge of Mi (on the right hand side of the picture) followed by one edge of Mq 
(center) followed by one edge of M 2 (on the left hand side of the picture). 



B Figures for the randomized two-pass algorithm 

We provide two figures illustrating the first pass (Figure [2]) and the second pass (Figure [3]) of 
Algorithm [3j 



3-augmentable edges 



B 

o-- 

o 
o 
o- 
o-"- 

O" 




O vertex G A 1 

% edge e Mi 

edge e M' \ M\ 

edge e M* 



other edges 



a- 
o- 



-o 
■o 

o 



r0-—0 

-o 



-o 



Figure 2: Illustration of the first pass of Algorithm [3l By Theorem [2j nearly all vertices of A' are 
matched in M' , in particular those that are not matched in Mq. 
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Mi C M' 




o — o 
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o 
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Edges forming a large 



matching of Fi 

O Vertices in A2 

Here, the edges of M' \ Mi 
are not drawn. 



Figure 3: Analysis of the second pass of Algorithm [3l Here, we see that Mq © M\ has two paths 
of length 2, and that both of those paths can be extended into 3-augmenting paths using M*: 
this illustrates |opt(i*2)| > 2. Matching M2, being a 1/2 approximation, will find at least one 
3-augmenting path. 

C Figures for the deterministic two-pass algorihtm 

We show two figures illustrating the first pass (Figure HJ and the second pass (Figure [5]) of Algo- 
rithm m 



3-augmentable edges 



B 

o 
o- 
o- 
a- 
a"- 
a- 




edge G Mi 
edge eS \ Mi 
edge e M* 



other edges 



a- 



o 







■o 



Figure 4: Illustration of the first pass of Algorithm In this example we set A = 2 and we 
compute an incomplete degree 2 limited semi- matching S. By Lemma [TU1 we match at least ||M*| 
A vertices. Since \M\ ps ±|M*|, some A vertices that are not matched in Mo are matched in S. 
The edges incident to those define M\. 
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s ^ Edges forming a maxi- 
mum matching of F2 

Q Vertices in A2 

Here, the edges of S \ Mi 
are not drawn. 

o— -o— — o— -- o 
o — o 

Figure 5: Analysis of the second pass of Algorithm Here, we see that Mq © M\ has five paths 
of length 2. These paths are not disjoint, but since the maximal degree in S is 2, Mq © M\ has at 
least \ ■ 5 disjoint paths, and hence \A2\ = 3 > \ ■ 5. A maximum matching in F2 is of size 3, and in 
the second pass, Greedy will find at least half of them leading to at least two 3- augmenting paths. 
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